PRoST: Distributed Execution of SPARQL Queries Using Mixed Partitioning Strategies
نویسندگان
چکیده
The rapidly growing size of RDF graphs in recent years necessitates distributed storage and parallel processing strategies. To obtain efficient query processing using computer clusters a wide variety of different approaches have been proposed. Related to the approach presented in the current paper are systems built on top of Hadoop HDFS, for example using Apache Accumulo or using Apache Spark. We present a new RDF store called PRoST (Partitioned RDF on Spark Tables) based on Apache Spark. PRoST introduces an innovative strategy that combines the Vertical Partitioning approach with the Property Table, two preexisting models for storing RDF datasets. We demonstrate that our proposal outperforms state-of-the-art systems w.r.t. the runtime for a wide range of query types and without any extensive precomputing phase.
منابع مشابه
PHD-Store: An Adaptive SPARQL Engine with Dynamic Partitioning for Distributed RDF Repositories
Many repositories utilize the versatile RDF model to publish data. Repositories are typically distributed and geographically remote, but data are interconnected (e.g., the Semantic Web) and queried globally by a language such as SPARQL. Due to the network cost and the nature of the queries, the execution time can be prohibitively high. Current solutions attempt to minimize the network cost by r...
متن کاملSAMUEL: A Sharing-based Approach to processing Multiple SPARQL Queries with MapReduce
The volume of RDF data is now growing tremendously. It is thus considered prudent to store and process massive RDF data with distributed SPARQL engines instead of relying on a singlemachine system.Many sophisticated index and partitioning schemes have also been proposed to support SPARQL query evaluations. However, existing SPARQL engines have mainly followed oneat-a-time scheme so that query e...
متن کاملPartout : a distributed engine for efficient RDF processing Partout : un moteur pour le traitement distribué de données RDF
The increasing interest in Semantic Web technologies has led not only to a rapid growth of semantic data on the Web but also to an increasing number of backend applications with already more than a trillion triples in some cases. Confronted with such huge amounts of data and the future growth, existing state-of-the-art systems for storing RDF and processing SPARQL queries are no longer sufficie...
متن کاملA Hybrid Approach to Linked Data Query Processing with Time Constraints
In addition to RDF data within documents published according to the Linked Data principles, SPARQL endpoints are also a potential source of a great deal of Linked Data. The execution of queries using languages such as SPARQL can use utilise both of these types of data sources. In this paper we present a hybrid approach to answering SPARQL queries that makes use of both link traversal-based and ...
متن کاملProcessing Aggregate Queries in a Federation of SPARQL Endpoints
More andmore RDF data is exposed on theWeb via SPARQL endpoints. With the recent SPARQL 1.1 standard, these datasets can be queried in novel and more powerful ways, e.g., complex analysis tasks involving grouping and aggregation, and even data frommultiple SPARQL endpoints, can now be formulated in a single query. This enables Business Intelligence applications that access data from federated w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018